Computer Vision

"Computer vision is an interdisciplinary field that deals with how computers can be made for gaining high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do." ˆHuang, T. (1996-11-19). Vandoni, Carlo, E, ed. Computer Vision : Evolution And Promise

"Computer vision is concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding." ˆhttp://www.bmva.org/visionoverview The British Machine Vision Association and Society for Pattern Recognition Retrieved February 20, 2017

Computer vision Source : https://medium.com/readers-writers-digest/beginners-guide-to-computer-vision-23606224b720

Computer vision is being used today in a wide variety of real-world applications, include:

  • Optical character recognition (OCR): reading handwritten postal codes on letters (Figure 1.4a) and automatic number plate recognition (ANPR);
  • Machine inspection: rapid parts inspection for quality assurance using stereo vision with specialized illumination to measure tolerances on aircraft wings or auto body parts (Figure 1.4b) or looking for defects in steel castings using X-ray vision;
  • Retail: object recognition for automated checkout lanes (Figure 1.4c);
  • 3D model building (photogrammetry): fully automated construction of 3D models from aerial photographs used in systems such as Bing Maps;
  • Medical imaging: registering pre-operative and intra-operative imagery (Figure 1.4d) or performing long-term studies of people’s brain morphology as they age;
  • Automotive safety: detecting unexpected obstacles such as pedestrians on the street, under conditions where active vision techniques such as radar or lidar do not work well (Figure 1.4e; see also Miller, Campbell, Huttenlocher et al. (2008); Montemerlo, Becker, Bhat et al. (2008); Urmson, Anhalt, Bagnell et al. (2008) for examples of fully automated driving);
  • Match move: merging computer-generated imagery (CGI) with live action footage by tracking feature points in the source video to estimate the 3D camera motion and shape of the environment. Such techniques are widely used in Hollywood (e.g., in movies such as Jurassic Park) (Roble 1999; Roble and Zafar 2009); they also require the use of precise matting to insert new elements between foreground and background elements (Chuang, Agarwala, Curless et al. 2002).

Known application in computer vision

  • Motion capture (mocap): using retro-reflective markers viewed from multiple cameras or other vision-based techniques to capture actors for computer animation;
  • Surveillance: monitoring for intruders, analyzing highway traffic (Figure 1.4f), and monitoring pools for drowning victims;
  • Fingerprint recognition and biometrics: for automatic access authentication as well as forensic applications.

ˆ Szeliski R. (2010). Computer Vision: Algorithms and Applications. Springer

Different Tasks in Computer Vision

1. Image Classification : Task of assigning an input image one label from a fixed set of categories

Paper : https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Animal classification

Classification in ImageNet : $$e = \frac 1 n \sum_{k} min_{j} d(l_{j}, g_{k}) \tag{1}$$

The image classification pipeline :

  1. Input : Training set
  2. Learning : Classifier / model
  3. Evaluation : Intuitively, we’re hoping that a lot of the predictions of model match up with the true answers (which we call the ground truth).

2. Object Localization

Object Localization

Vehicle detection
Object Localization

Object Localization


3. Object Detection

Paper : https://arxiv.org/abs/1512.03385

"In object detection it is possible to detect more than 1 object"

Object Localization

In [1]:
import io
import base64
from IPython.display import HTML

video = io.open('./media/CV/MOVIE1.MOV', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<center><video alt="test" controls style="width: 300px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video></center>'''.format(encoded.decode('ascii')))
Out[1]:

3. Object Segmentation

Paper : https://arxiv.org/abs/1407.1808

Object Segmentation Object Segmentation

More computer vision task ?

  • Object tracking
  • Image & language
    • Image Captioning
    • Video Captioning
    • Question Answering
  • Image Generation
  • Visual Analogy
  • Action Detection
  • Artistic Style
  • Face Recognition
  • Facial Landmark Detection
  • Autonomous Self-Driving Car
  • etc

Not hotdog app

Wanna try challenge : ) ?

Large Scale Visual Recognition Challenge (ILSVRC)